Application-Specific Memory Subsystems

نویسندگان

Joseph George Wingbermuehle

Roger D. Chamberlain

Kunal Agrawal

Ron K. Cytron

Viktor Gruev

Krishna Kavi

Hiro Mukai

Joseph G. Wingbermuehle

چکیده

OF THE DISSERTATION Application-Specific Memory Subsystems by Joseph G. Wingbermuehle Doctor of Philosophy in Computer Science Washington University in St. Louis, 2015 Professor Roger D. Chamberlain, Chair The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. xi We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Superoptimizing Memory Subsystems for Multiple Objectives

We consider the automatic determination of application-specific memory subsystems via superoptimization, with the goals of reducing memory access time and of minimizing writes. The latter goal is of concern for memories with limited write endurance. Our subsystems outperform general-purpose memory subsystems in terms of performance, number of writes, or both.

متن کامل

User-Level Management of Kernel Memory

Kernel memory is a resource that must be managed carefully in order to ensure the efficiency and safety of the system. The use of an inappropriate management policy can weaken the isolation between subsystems, lead to suboptimal performance, and even make the kernel vulnerable to denial-of-service attacks. Yet, many existing kernels use only a single built-in policy, which is always a compromis...

متن کامل

Dissociable neural subsystems underlie visual working memory for abstract categories and specific exemplars.

An ongoing debate concerns whether visual object representations are relatively abstract, relatively specific, both abstract and specific within a unified system, or abstract and specific in separate and dissociable neural subsystems. Most of the evidence for the dissociable subsystems theory has come from experiments that used familiar shapes, and the usage of familiar shapes has allowed for a...

متن کامل

Analysis of Multithreaded Multiprocessors with Distributed Shared Memory

In this paper we propose an analytical model, based on multi-chain closed queuing networks, to evaluate the performance of multithreaded multiprocessors. The queuing network is solved by using approximate Mean Value Analysis. Unlike earlier work which modeled individual subsystems in isolation, our work models processor , memory and network subsystems in an integrated manner. Such an approach b...

متن کامل

A Heterogeneous Multiprocessor Architecture for Flexible Media Processing

0740-7475/02/$17.00 © 2002 IEEE July–August 2002 NEW MEDIA APPLICATIONS such as highdefinition digital television, set-top boxes with time-shift functionality, 3D games, video conferencing, and MPEG-4 interactivity have generated a demand for increasingly flexible consumer electronics products. These products are evolving into multifunctional devices that combine a set of media applications. Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Application-Specific Memory Subsystems

نویسندگان

چکیده

منابع مشابه

Superoptimizing Memory Subsystems for Multiple Objectives

User-Level Management of Kernel Memory

Dissociable neural subsystems underlie visual working memory for abstract categories and specific exemplars.

Analysis of Multithreaded Multiprocessors with Distributed Shared Memory

A Heterogeneous Multiprocessor Architecture for Flexible Media Processing

عنوان ژورنال:

اشتراک گذاری